modify char array in function || reading lines from file

**deathmetal** · 03-07-2021

I read in file with four columns skipping line one. File is tab/space delimited, my goal is to get the last column.

I use function where parameter is char array. The function is OK as long it is working, however, it doesn't move the pointer of Char array to new location.

I cannot understand why. Or, how to move char pointer to location. It prints complete line as 1 startcoord, end coord and gene.

In the function, the gene name is printed fine, that is pointer is moved correctly.

Code:

#include <stdio.h>
#include <stdlib.h>
#include <malloc.h>
#include <string.h>


#include "funcs_combinations.h"


#define MAX_LEN 200


void get_gene_name(char temp_line[])
{
    int space_count = 0; //get location of last space
    size_t itr = 0; //iterate
    size_t len_line = strlen(temp_line);


    while (itr < len_line)
    {
        if ((temp_line)[itr] == ' ' || (temp_line)[itr] == '\t')
        {
            space_count = itr; //store space location
        }
        itr++;
    }
    printf("space location was %d\n", space_count);
    temp_line = space_count + temp_line; //move pointer 
    printf("moved location is %d\n", temp_line);
    printf("location is %s\n", temp_line); //printing works fine here but goes bad in returning
    // return (temp_line);
}
//////////////////////////////


void remove_trailingspaces(char *newlines)
{


/*
This function is called while reading the first line


*/
    size_t length = strlen(newlines);
    size_t i = 0;
    for (i = 0; i < length; i++)
    {
        if (newlines[i] == '\n')
        {
            newlines[i - 1] = '\0';
            break;
        }
    }
}




//////////////////////////////
int main(int argc, char *argv[])
{
    FILE *fptr;
    char c[MAX_LEN];
    int line_number = 0;


    fptr = fopen(argv[1], "r");


    if (fptr == NULL)
    {
        printf("null pointer for file opening\n");
    }


    while (fgets(c, MAX_LEN, fptr))
    {
        if (line_number > 1) //make sure header is read 
        {
            if (strcmp(c, "\n") != 0)
            {
                remove_trailingspaces(c);
                get_gene_name(c);
                printf("new gene is %s\n", c); // it doesn't move to new location 
                add_node(&start, c); //only if not new line
            }
        }


        line_number++;
    }


    fclose(fptr);


    return 0;
}

I have input file with lines as:

CHR START END GENE
1 1234 546 GENE1
1 2234 5346 GENE12
1 4234 5246 GENE14
1 6234 5546 GENE16

I want to get last column, that is gene name. GENE1, GENE12 and such from function get_gene_name.
Edit: itr for get_gene_name (while loop )

**laserlight** · 03-07-2021

To understand the problem, let's go back to a simple example:

Code:

#include <stdio.h>

void foo(int x)
{
    x = 123;
}

int main(void)
{
    int y = 0;
    foo(y);
    printf("%d\n", y);
    return 0;
}

As you probably can tell from reading the above code, the output of the above program is 0, not 123. The reason of course is that the assignment of 123 to x in foo only affects the local variable, not the variable from the caller. Now let's introduce a pointer:

Code:

#include <stdio.h>

void foo(int *x)
{
    *x = 123;
}

int main(void)
{
    int y = 0;
    foo(&y);
    printf("%d\n", y);
    return 0;
}

Now, the output is indeed 123, because by assigning 123 to *x, what x points to, i.e., y, is modified. Let's go back to your code. As you may know, this:

Code:

void get_gene_name(char temp_line[])

is equivalent to:

Code:

void get_gene_name(char *temp_line)

So let's go back to the pointer example, but make it something like what you did in your code:

Code:

#include <stdio.h>

void foo(int *x, int *p)
{
    x = p;
}

int main(void)
{
    int y = 0;
    int value = 123;
    foo(&y, &value);
    printf("%d\n", y);
    return 0;
}

Once again, the output is 0 instead of 123. The reason brings us back to the first example: the assignment of p to x only affects the local variable, not the variable from the caller.

**deathmetal** · 03-07-2021

Thank you.

I guessed it could be an issue with local and global variable.
Now, I provide another array, copy into it and it works.

Code:

#include <stdio.h>
#include <string.h>


void get_gene_name(char temp_line[], char *temp_copy)
{
    int space_count = 0; //get location of last space
    size_t itr = 0; //iterate
    size_t len_line = strlen(temp_line);




    while (itr < len_line)
    {
        if ((temp_line)[itr] == ' ' || (temp_line)[itr] == '\t')
        {
            space_count = itr; //store space location
        }
        itr++;
    }
    printf("space location was %d\n", space_count);
    temp_line = space_count + temp_line; //move pointer 
    printf("moved location is %d\n", temp_line);
    printf("location is %s\n", temp_line); //printing works fine here but goes bad in returning
    
    temp_copy[0]='\0';
    strcpy(temp_copy,temp_line);
    // return (temp_line);
}
//////////////////////////////
int main(void)
{


char copied[30];
    char length[80]="value is nuts with";
    get_gene_name(length, copied);
    //foo(&y, &value);
    printf("new value is %s\n",copied);
    
    return 0;
}

I have following two concerns:

1) is copying OK? I am looking at several genes (20-50K+)

2) How do I not run into problem of local-global in future? What I mean to ask is what is the rule of thumb when change is going to be made to the variable from caller and otherwise?

**laserlight** · 03-07-2021

Originally Posted by deathmetal

1) is copying OK? I am looking at several genes (20-50K+)

It is not wise to have the destination array smaller than the source array.

But you may not need an auxiliary array, e.g., I think you might want to do something like this instead:

Code:

void trim_whitespace(char *text)
{
    size_t len = strlen(text);
    if (len == 0)
    {
        return;
    }

    char *first_non_space = text;
    while (isspace(*first_non_space))
    {
        ++first_non_space;
    }

    char *last_non_space = text + len - 1;
    while (first_non_space < last_non_space && isspace(*last_non_space))
    {
        --last_non_space;
    }

    while (first_non_space <= last_non_space)
    {
        *text++ = *first_non_space++;
    }
    *text = '\0';
}

I suggest this because:

From your post #1, you want to trim whitespace from the end.
get_gene_name seems to only be trimming whitespace from the front to get the gene get_gene_name
You can do an in-place copy, but not with strcpy because strcpy is not permitted to have overlapping arguments (because depending on how it is implemented, an overlap could result in a bug).
You could use memmove, which does allow overlapping arguments, but in this case it looks like it may be simpler to just implement the copy directly such that a bug due to overlap is avoided.

If your actual data is such that trimming whitespace from the front is uncommon, then you can make an optimisation to check if first_non_space == text, and if so, you only set:

Code:

*(last_non_space + 1) = '\0';

EDIT:
There's yet another possibility for optimisation though, which I realised I should mention after answering your second question: we could go back to your idea of pointer arithmetic for trimming whitespace from the start, but return the pointer instead of modifying the array, and only modify the array to trim whitespace from the end. In that case, you could do this:

Code:

char *trim_whitespace(char *text)
{
    size_t len = strlen(text);
    if (len == 0)
    {
        return text;
    }

    char *first_non_space = text;
    while (isspace(*first_non_space))
    {
        ++first_non_space;
    }

    char *last_non_space = text + len - 1;
    while (first_non_space < last_non_space && isspace(*last_non_space))
    {
        --last_non_space;
    }

    *(last_non_space + 1) = '\0';
    return first_non_space;
}

To use this, you would do something that amounts to this:

Code:

char text[80];
// populate text
// ...
char *result = trim_whitespace(text);
// use result instead of text

It is arguably a little error-prone since a programmer using this function might expect it to change the text array in-place, so you'll have to document it carefully to avoid a bug being introduced under maintenance.

Originally Posted by deathmetal

2) How do I not run into problem of local-global in future? What I mean to ask is what is the rule of thumb when change is going to be made to the variable from caller and otherwise?

That's easy: when you assign to a parameter, ask yourself if you want the change to be reflected in the caller. If you do, then that's wrong, even if the parameter is a pointer. Rather, you need a pointer (and hence a pointer to a pointer if the parameter is already a pointer) so that you can dereference it, or you need to find some other way to accomplish the task, e.g., return a value.

**flp1969** · 03-07-2021

One question: Since the lines (except the header) are fixed as
<int> <int> <int> <string>
Where string has no spaces, Why not read a line and use sscanf() to get these values?

Code:

char buffer[MAX_BUFSIZE];

// Ignore first line
if ( ! fgets( buffer, MAX_BUFFSIZE, fin ) )
{ ... error reading header ...}

while ( fgets( buffer, MAX_BUFFSIZE, fin ) )
{
  if ( sscanf( buffer, "%d %d %d %s", &v1, &v2, &v3, s ) != 4 )
  { ... error reading data... }

  // process data
}

**deathmetal** · 03-08-2021

hi laserlight,
the code (trim_whitespace(char *text) with *text = '\0'; in the end ) you shared doesn't work on my end.

Doesn't work mean I don't get the gene name, but complete line.
It could be due to space and new line character together at the end of line, as I copy pasted columns from an excel.

However, I used logic and modified my code:

Code:

void get_gene_name(char temp_line[])
{
    int space_count = 0;
    size_t itr = 0;
    size_t len_line = strlen(temp_line);


    while (itr < len_line)
    {
        if ((temp_line)[itr] == ' ' || (temp_line)[itr] == '\t')
        {
            space_count = itr;
        }
        itr++;
    }


    itr = 0;       //iterate again
    space_count++; //start from next position
    while (space_count < len_line)
    {
        temp_line[itr++] = temp_line[space_count++]; //set character
    }
    temp_line[itr] = '\0';
}

This works fine and as intended. Please let me know how to optimize and improve on the code if possible.

Thank you

**laserlight** · 03-08-2021

My apologies, I misread both your description and your code: you're not doing a whitespace trim, but parsing to extract a field from a space-separated format. flp1969 in post #5 is right: for your particular parsing use case, fgets + sscanf would be better. You can skip storing the fields that you don't need by making use of * to suppress assignment, and then you just need to specify the field width for the field that you want to extract to avoid buffer overflow.

**deathmetal** · 03-08-2021

Originally Posted by laserlight

My apologies, I misread both your description and your code: you're not doing a whitespace trim, but parsing to extract a field from a space-separated format. flp1969 in post #5 is right: for your particular parsing use case, fgets + sscanf would be better. You can skip storing the fields that you don't need by making use of * to suppress assignment, and then you just need to specify the field width for the field that you want to extract to avoid buffer overflow.

No problems.

I will try with #5.
If you get chance can you please review my code I put in my last post?

Thank you laser!

Thread: modify char array in function || reading lines from file

Thread Tools

Search Thread

Display

modify char array in function || reading lines from file

Similar Threads

Reading from a file and skipping lines to find string in file

Trying to modify array in a shared library function

Reading lines from a file and storing it in an array from typdef struct

modify function pass it array

Reading lines from a File

Tags for this Thread